Pyparsing in 1.5.6 transforms certain expressions more aggressively than it did 1.5.2, in particular with setResultsName. Consider the following example with a stripped-down grammar that is manipulated to produce a tree in a separate function (this is supposed to illustrate why this bites me):
def getGrammar():
from pyparsing import(Forward, CaselessKeyword, Word, alphas,
ParserElement, ZeroOrMore, Literal, Optional)
tableName = Word(alphas)
joinedTable = Word("+-")
tableReference =(joinedTable
| tableName)
fromClause =(CaselessKeyword("FROM")
+ tableReference)("fromClause")returndict((k, v)for k, v inlocals().iteritems()ifisinstance(v, ParserElement))def enableTree(syms):
def makeAction(name):
def action(s, pos, toks):
return[name, toks]return action
for name in syms:
syms[name].addParseAction(makeAction(name))if __name__=="__main__":
importpprint
syms = getGrammar()
enableTree(syms)pprint.pprint(syms["fromClause"].parseString("FROM ab").asList())
whereas 1.5.6 folds the subexpressions into fromClause (but only when fromClause carries a result name); with this particular scheme, this has user-visible consequences in that the program prints
['fromClause', ['FROM', 'ab']]
I admit I've not really traced this yet, but since I suspected that streamline is now being called more liberally I tried to inhibit its actions by setNameing the RHS symbols in the fromClause rule before use. Alas, to no avail.
So -- is there anything more sensible I can do to get 1.5.2 results from 1.5.6? Or do you consider my scheme of adding actions long after the symbols have been defined as too harebrained?
— Markus
used the installers at http://pypi.python.org/pypi/pyparsing/1.5.6
tried Python 3 version. Does not work for py 3 and running 2to3 tool did not help.
tried the version for python 2.7 and this works.
some ... seems to just take the pyparsing code unchanged, create installers for python 3 and publish it untested on the pypi website.
So for Python 3 check out LEPL (edit 2: and Modgrammar) instead of Pyparsing.
However thanks for Pyparsing for Python 2 which is a nice and working Parser
Parse Result Object Behaves in Unexpected ways . . .
Running a getattr() on any parse result object returns an empty string. This can lead to very confusing behavior like:
result.aslist()
TypeError: 'str' object is not callable, when this is usually thought of as an attribute error. This could lead to a lot of problems passing quietly . . .
-Matt G. (meawoppl at some google mail service)
Simultaneous rules and why tabs must be special on default
1.
I should check that a message A fulfills a quite complex BNF-grammar and that message A's length is not greater than 'l'.
Of course I can parse the message and then check if the length limitations are not violated.
The problem is, that this message A is a part of bigger grammar B and that there might be multiple instances of message A inside B.
Even further, is it possible to make easily an element, that has two or more rules that must be valid at the same time?
For the length example above, this could mean for example something like
Could it be possible that the default value for 'keepTabs' would be True, because it was annoying to find out that tabs are special.
-- kummahiih
Extending ParseResults class
Hi, I've tried to extend this class into a new one "CodeItem" that automatically handles code location for itself and its sub-items.
i think it's convenient for reporting semantic errors. Also there's a simple class ParsingError that takes a message and problem CodeItems as parameters.
The most difficulty I had with - is that ParseResults changes its "appearance" depending on whether instance is named or not, so I've re-implemented getName() and getitem() may be a little sketchy.
Does it make sense for a parser to remember each item's offset automatically? Maybe it will cost some speed - not sure how critical this loss would be though.
Also I've created a wrapper decorator function that would adapt parseAction handler functions to use CodeItems instead of (s,loc,tok) arguments.
-Evgeny.
class ParsingError(Exception):
def__init__(self,msg,*items):
self.msg= msg
self.items= items
def__str__(self):
out =[t.info()for t inself.items]return'parsing error: %s\nproblem item(s):\n%s' \
% (self.msg,'\n'.join(out))def tok2str(tok):
ifisinstance(tok,ParseResults):
list= tok.asList()return tok2str(list)eliftype(tok)==type([]):
out =''for item in tok:
out = out + tok2str(item)return out
elifisinstance(tok,str):
return tok
else:
raiseException('internal error type=%s' % type(tok))def toklen(tok):
s = tok2str(tok)returnlen(s)class CodeItem(ParseResults):
def__init__(self,s,loc,t):
ifisinstance(t,ParseResults):
name = t.getName()
tlist = t.asList()else:
name =None
tlist = t
ParseResults.__init__(self,tlist)self.__ci_name = name
self.__ci_source = s
self.__ci_loc = loc
def getName(self):
returnself.__ci_name
def__str__(self):
tok = ParseResults.__str__(self)
lineno =self.lineno()
col =self.col()return'line=%d col=%d tokens=%s' % (lineno,col,tok)def__repr__(self):
return ParseResults.__str__(self)def info(self):
src =self.source()
line =self.lineno()
col =self.col()return'(line=%3d col=%3d) %s' % (line,col,src)def col(self):
return col(self.__ci_loc,self.__ci_source)def lineno(self):
return lineno(self.__ci_loc,self.__ci_source)def source(self):
return tok2str(self)def__getitem__(self,i):
#this function could be made faster by optimizing toklen function()
t = ParseResults.__getitem__(self,i)#here I might want to support slicing as wellifisinstance(i,int):
offset =0for j inrange(i):
tj = ParseResults.__getitem__(self,j)
offset += toklen(tj)
loc =self.__ci_loc + offset
s =self.__ci_source
return CodeItem(s,loc,t)else:
#here is a hole, maybe CodeItem should be constructed instead#I had problems with named ParseResultsreturn t
Wrapper function that will convert tokens into CodeItem's and can be used as decorator for parseAction functions as defined in pyparsing.py
This one takes only named parsing results and reconfigures the parseAction to take CodeItem instead of s,loc,tok arguments.
def wrap_named_tokens(f):
"""filters named tokens
"""def wrapper(self,s,loc,tok):
#code_items = TokenList(s,loc,tok)
code_items =[]
cloc = loc
for t in tok:
inc = toklen(t)ifisinstance(t,ParseResults)and t.getName()!=None:
code_items.append(CodeItem(s,cloc,t))
cloc = cloc + inc
f(self,code_items)return functools.update_wrapper(wrapper,f)
Example of parseAction definition:
@wrap_named_tokens
def some_parse_action(code_items):
for item in code_items:
if is_not_good(item):
raise ParsingError('this code has error',item)
else:
do_something_with(item)
The error handler will print line and column numbers automatically.
Setting whitespace characters after defaulting
I am trying to parse a language which has one line statements which are separated by newlines and possible blank lines. In order to parse it I tried setting the default whitespace chars to " \t" and specific whitespace chars for document parser to " \t\n" but I'm not getting the desired effect. Here's an example:
Am I misunderstanding these commands, or is there a better way to do this?
-Shawn
[reply from Paul]
Shawn -
Well, there is a little confusion on your part, but there is also a subtle bug in pyparsing that prevents you from doing this the actual correct way. Here is the code as I imagine it should be written.
Only a single call to setDefaultWhitespaceChars, no need to set them on individual parse expressions. However, there is a bug in StringEnd that raises an exception when reading both a LineEnd and a StringEnd at the end of the input string (which I will have fixed in the online CVS code in a few minutes). Note that in your original code that did not work, there was no place for the line breaks to be either parsed or skipped over. The setting of whitespace chars only affects the skipping over of whitespace at the beginning of an expression, so setting whitespace to " \t\n" for document only skips over those characters at the very beginning, not during all immediate child elements of document.
I resolved the newline processing question by leaving in your call to setDefaultWhitespaceChars, and then adding an explicit parse expression to read newlines at the end of each statement, since this is the only place where you want to see newlines.
-- Paul
Can setParseActions be used deeper into the parse hierarchy ?
result = grammar.transformString( inputstring )
setParseActions can be attached anywhere in the hierarchy. Your example should work ok.
As far as handling comments, you can also look at using ignore:
The reason this is important is that comments can appear even in the middle of a command.
Is there a best practice for parsing mixed content?
Currently I'm having a hard time using parseString() to analyze a wiki paragraph containing mixed content. For example:
Wiki paragraphs can contain [[links]] as well as **bold** and //italic// text.
How would the rules for this paragraph look like that will also preserve the text between wiki markup? Are there any examples which I could have a look at?
For this kind of parsing, parseString is not the best method to use. Just for review, there are now 4 different ways to invoke a pyparsing grammar:
- parseString - parses input string from the beginning, until a mismatch is found or the end of the grammar
- scanString - a generator for partial string matching; returns the matched tokens, and start and end locations of the match
- transformString - wrapper around scanString to apply parse actions to transform the input string
- searchString - wrapper around scanString to return a list of the matched tokens
As you have found, parseString is suitable only if you have a grammar that completely defines the content of the input text. scanString is able to "scan" through the input text, looking for matches - this is closer to what you want, since it only requires definition of pyparsing expressions for that which you are scanning for. transformString and searchString are simple wrappers around scanString, for the most common applications of scanString: converting expressions based on parseActions, and searching for matches and returning a list of matches. So for a wiki markup processor, I'd say transformString is the best fit. In fact, there is a new example on the Examples page titled simpleWiki.py. The one complication is when you get markup nested within markup, but with a little diligence, I hope you can get it worked out.
I think this could be very useful for making a quick utility where you want a user to enter some string to be parsed, and easily use pyparser to do the work. Let me know what you think.
Unicode issues
When parsing Unicode strings, PyParsing returns a mixture of unicode and str objects as a result (ASCII strings are always converted to str, others are left intact). This probably should not happen, and intermixing byte strings with Unicode strings is usually not a good idea. I suggest the following patch:
--- pyparsing.py.orig 2008-04-21 23:18:59.000000000 +0600
+++ pyparsing.py 2008-04-21 23:21:53.000000000 +0600
@@ -87,6 +87,11 @@
str(obj). If that fails with a UnicodeEncodeError, then it tries unicode(obj). It
then < returns the unicode object | encodes it with the default encoding | ... >.
"""
+
+ # Do not convert unicode to str
+ if isinstance(obj, unicode):
+ return obj
+
try:
# If this works, then _ustr(obj) has the same behaviour as str(obj), so
# it won't break any existing code.
equality / equivalency between grammars
I might be the first person to ever equality-test pyparsing grammars, but I need to for pyparsing_helper to work right, and it looks like ParserElement. eq wasn't written to support that (as of 1.5.1).
In [41]: Literal('a') == "a"
Out[41]: True
In [42]: Literal('a') == Literal('a')
Out[42]: False
I've submitted a patch, but in the meantime, here's a monkeypatch.
Generating EBNF-like things from pyparsing grammars?
I'd like to generate some variant of EBNF -- it doesn't need to be too strict -- from a pyparsing grammar. Has anyone tried to do such a thing?
-- Markus
[ reply from Paul ]
Pyparsing's expressions are already self-describing in a quasi-BNF format. For example, here are some of my typical examples (a server name that could be a host name or an IP address), and how they look if printed out:
Since hostname uses different sets of characters for its initial vs. body character, it displays a two-argument format. Unfortunately, the truncation feature clips the significant difference (that the body can contain numeric digits in addition to alpha characters).
Now if we assemble these base expressions into an IP address, we see a couple of other problems:
We really don't want ip_addr to resolve any deeper than its component expressions. For readability's sake, pyparsing allows you to attach a name to an expression (this is not the same as setResultsName):
>>> integer.setName("integer")
integer
Now if we rebuild our ip_addr expression and print out its representation, things are a little better:
Hmmm, still some room for improvement. What are seeing is the intermediate form that gets created by the '+' operator, which calls ParserElement.__add__(a,b), and returns And([a,b]). Since __add__ can only see two elements at a time, an expression like "a + b + c" returns the nested And([And([a,b]),c]). This is where pyparsing has to do some reshuffling, since the user did not really add any such structure, and would just like things to be processed like And([a,b,c]). So pyparsing has an internal method named streamline() that tries to clean things up a bit. streamline() looks at expressions of like type and tries to collapse unnecessary nesting (while still preserving things like results names, grouping, etc.). If we call it, we can see the results:
Now this is a lot cleaner! (User code rarely needs to call streamline, it is automatically called as part of the logic in parseString.)
But this is only helpful for showing the top-level expression. If we want to drill down into the parser, we'll need to peel away the names we gave the sub-expressions. See how this is done in the attached little script:
from pyparsing import *
integer = Word(nums).setName("integer")
ip_addr = integer + '.' + integer + '.' + integer + '.' + integer
hostname = Word(alphas, alphanums+'_').setName("hostname")
hostref = hostname | ip_addr
# internal pyparsing method, rarely called in user code
hostref.streamline()
for exprname in "hostref hostname integer".split():
expr = locals()[exprname]
e = expr.copy()
if hasattr(e,"name"): del e.name
print exprname,'::',e
This isn't a complete solution, but maybe it will give you some ideas on how to approach your problem.
-- Paul
[Markus again]
Thanks, Paul. I should really learn to control my coding habit, since of course I got impatient while offline and now coded something that could have made really good use of streamline(). Anyway, there are quite a few subtleties I'd probably have encountered even with streamline. If someone needs something like this: http://www.tfiu.de/homepage/hacks/#pyparsingToEBNF (warning: much more verbose than Paul's suggestions)-- and I'll gladly expand it if someone actually uses it.
[Ben Liles]
Would it be possible to remove the download url from the pypi record so that easy_install will download the tar.gz uploaded to the pypi? That way it won't have to read from wikispaces. I'm using buildout and cannot specify the full url to get it from.
Try it now. - Paul
Error installing
I see in the README that python 2.3.2 or later is required. I am running 2.3.4 on RedHat, and got this error when I tried to install:
[root@host pyparsing-1.5.0]# python setup.py install
Traceback (most recent call last):
File "setup.py", line 6, in ?
from pyparsing import __version__
File "/var/tmp/pyparsing-1.5.0/pyparsing.py", line 2506
matchOrder += list(e for e in self.exprs if isinstance(e,Optional) and e.expr in tmpOpt)
^
SyntaxError: invalid syntax
Should I upgrade?
[Mark]
Another syntax error raised during install (pyparsing_py3.py, line 2470)
This is my sys.version:
2.5.1 (r251:54863, Feb 6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)]
2010/05/15: Same problem on CygWin with Python 2.5. Replace "as" with ",". Or just ignore the error because that module is intended for v3 only.
know this is random, but can we have a better page were you can comment or bring ideas that the owners can have a look at? or can i email the owners about a new idea?
Post it to the Discussion tab on the Pyparsing WIki home page. (Anyone can post discussion comments)
alphas is locale-dependent
The documentation claims that "alphas" is 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghikjlmnopqrstuvwxyz'. But in reality this is not the case! It constructs it out of alphas.uppercase and alphas.lowercase, which is locale-dependent -- and on my system is full of accented characters! It obviously doesn't make sense to have a programming language whose legal identifiers vary from system to system, so why not just use the literal string 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghikjlmnopqrstuvwxyz'? - Kef Schecter
Behaviour change between 1.5.6 and 1.5.2
Pyparsing in 1.5.6 transforms certain expressions more aggressively than it did 1.5.2, in particular with setResultsName. Consider the following example with a stripped-down grammar that is manipulated to produce a tree in a separate function (this is supposed to illustrate why this bites me):With pyparsing 1.5.2, this would print
whereas 1.5.6 folds the subexpressions into fromClause (but only when fromClause carries a result name); with this particular scheme, this has user-visible consequences in that the program prints
I admit I've not really traced this yet, but since I suspected that streamline is now being called more liberally I tried to inhibit its actions by setNameing the RHS symbols in the fromClause rule before use. Alas, to no avail.
So -- is there anything more sensible I can do to get 1.5.2 results from 1.5.6? Or do you consider my scheme of adding actions long after the symbols have been defined as too harebrained?
— Markus
Python 3 version throws exceptions. Unusable.
Python 2 version works. (refer to http://pypi.python.org/pypi/pyparsing/1.5.6)
(edit 2)
used the installers at http://pypi.python.org/pypi/pyparsing/1.5.6
tried Python 3 version. Does not work for py 3 and running 2to3 tool did not help.
tried the version for python 2.7 and this works.
some ... seems to just take the pyparsing code unchanged, create installers for python 3 and publish it untested on the pypi website.
So for Python 3 check out LEPL (edit 2: and Modgrammar) instead of Pyparsing.
However thanks for Pyparsing for Python 2 which is a nice and working Parser
Parse Result Object Behaves in Unexpected ways . . .
Running a getattr() on any parse result object returns an empty string. This can lead to very confusing behavior like:result.aslist()
TypeError: 'str' object is not callable, when this is usually thought of as an attribute error. This could lead to a lot of problems passing quietly . . .
-Matt G. (meawoppl at some google mail service)
Simultaneous rules and why tabs must be special on default
1.I should check that a message A fulfills a quite complex BNF-grammar and that message A's length is not greater than 'l'.
Of course I can parse the message and then check if the length limitations are not violated.
The problem is, that this message A is a part of bigger grammar B and that there might be multiple instances of message A inside B.
Even further, is it possible to make easily an element, that has two or more rules that must be valid at the same time?
For the length example above, this could mean for example something like
elementWithMaxLength = complexElement & Regex('.{1,%d}'%l, flags=re.S+re.M).suppress().. with setParseAction perhaps?2.
Could it be possible that the default value for 'keepTabs' would be True, because it was annoying to find out that tabs are special.
-- kummahiih
Extending ParseResults class
Hi, I've tried to extend this class into a new one "CodeItem" that automatically handles code location for itself and its sub-items.i think it's convenient for reporting semantic errors. Also there's a simple class ParsingError that takes a message and problem CodeItems as parameters.
The most difficulty I had with - is that ParseResults changes its "appearance" depending on whether instance is named or not, so I've re-implemented getName() and getitem() may be a little sketchy.
Does it make sense for a parser to remember each item's offset automatically? Maybe it will cost some speed - not sure how critical this loss would be though.
Also I've created a wrapper decorator function that would adapt parseAction handler functions to use CodeItems instead of (s,loc,tok) arguments.
-Evgeny.
Wrapper function that will convert tokens into CodeItem's and can be used as decorator for parseAction functions as defined in pyparsing.py
This one takes only named parsing results and reconfigures the parseAction to take CodeItem instead of s,loc,tok arguments.
Example of parseAction definition:
@wrap_named_tokens def some_parse_action(code_items): for item in code_items: if is_not_good(item): raise ParsingError('this code has error',item) else: do_something_with(item)The error handler will print line and column numbers automatically.Setting whitespace characters after defaulting
I am trying to parse a language which has one line statements which are separated by newlines and possible blank lines. In order to parse it I tried setting the default whitespace chars to " \t" and specific whitespace chars for document parser to " \t\n" but I'm not getting the desired effect. Here's an example:
from pyparsing import * ParserElement.setDefaultWhitespaceChars(" \t") statement = Literal("foobar") | Word(nums) statements = ZeroOrMore(statement) statements.setWhitespaceChars(" \t\n") document = StringStart() + statements + StringEnd() document.setWhitespaceChars(" \t\n") test = "5498\n foobar" print test, "->", document.parseString(test)Which raises an error when it hits the newline char.However, if I manually set the whitespace chars for all items, it works as expected:
from pyparsing import * statement = Literal("foobar") | Word(nums) statement.setWhitespaceChars(" \t") statements = ZeroOrMore(statement) statements.setWhitespaceChars(" \t\n") document = StringStart() + statements + StringEnd() document.setWhitespaceChars(" \t\n") test = "5498\n foobar" print test, "->", document.parseString(testproduces:Am I misunderstanding these commands, or is there a better way to do this?
-Shawn
[reply from Paul]
Shawn -
Well, there is a little confusion on your part, but there is also a subtle bug in pyparsing that prevents you from doing this the actual correct way. Here is the code as I imagine it should be written.
from pyparsing import * ParserElement.setDefaultWhitespaceChars(" \t") statement = (Literal("foobar") | Word(nums)) + LineEnd().suppress() statements = ZeroOrMore(statement) document = StringStart() + statements + StringEnd() test = "5498\n foobar" print test, "->", document.parseString(test)Only a single call to setDefaultWhitespaceChars, no need to set them on individual parse expressions. However, there is a bug in StringEnd that raises an exception when reading both a LineEnd and a StringEnd at the end of the input string (which I will have fixed in the online CVS code in a few minutes). Note that in your original code that did not work, there was no place for the line breaks to be either parsed or skipped over. The setting of whitespace chars only affects the skipping over of whitespace at the beginning of an expression, so setting whitespace to " \t\n" for document only skips over those characters at the very beginning, not during all immediate child elements of document.
I resolved the newline processing question by leaving in your call to setDefaultWhitespaceChars, and then adding an explicit parse expression to read newlines at the end of each statement, since this is the only place where you want to see newlines.
-- Paul
Can setParseActions be used deeper into the parse hierarchy ?
/* dfadsfasdfasdfasdf */
comment
startcomment + SkipTo(endcomment,include=True)
comment.setParseActions( replaceWith("COMMENT"))
grammar = OneOrMore( comment | command1 | command2 )
result = grammar.transformString( inputstring )
setParseActions can be attached anywhere in the hierarchy. Your example should work ok.
As far as handling comments, you can also look at using ignore:
The reason this is important is that comments can appear even in the middle of a command.
Is there a best practice for parsing mixed content?
Currently I'm having a hard time using parseString() to analyze a wiki paragraph containing mixed content. For example:How would the rules for this paragraph look like that will also preserve the text between wiki markup? Are there any examples which I could have a look at?
For this kind of parsing, parseString is not the best method to use. Just for review, there are now 4 different ways to invoke a pyparsing grammar:
- parseString - parses input string from the beginning, until a mismatch is found or the end of the grammar
- scanString - a generator for partial string matching; returns the matched tokens, and start and end locations of the match
- transformString - wrapper around scanString to apply parse actions to transform the input string
- searchString - wrapper around scanString to return a list of the matched tokens
As you have found, parseString is suitable only if you have a grammar that completely defines the content of the input text. scanString is able to "scan" through the input text, looking for matches - this is closer to what you want, since it only requires definition of pyparsing expressions for that which you are scanning for. transformString and searchString are simple wrappers around scanString, for the most common applications of scanString: converting expressions based on parseActions, and searching for matches and returning a list of matches. So for a wiki markup processor, I'd say transformString is the best fit. In fact, there is a new example on the Examples page titled simpleWiki.py. The one complication is when you get markup nested within markup, but with a little diligence, I hope you can get it worked out.
PyParsing Support Added to Utility Mill
I thought you guys might be interested. You can now make web based utilities using the pyparsing module . As an example I implemented the chemical formula parser example here .I think this could be very useful for making a quick utility where you want a user to enter some string to be parsed, and easily use pyparser to do the work. Let me know what you think.
Unicode issues
When parsing Unicode strings, PyParsing returns a mixture of unicode and str objects as a result (ASCII strings are always converted to str, others are left intact). This probably should not happen, and intermixing byte strings with Unicode strings is usually not a good idea. I suggest the following patch:--- pyparsing.py.orig 2008-04-21 23:18:59.000000000 +0600 +++ pyparsing.py 2008-04-21 23:21:53.000000000 +0600 @@ -87,6 +87,11 @@ str(obj). If that fails with a UnicodeEncodeError, then it tries unicode(obj). It then < returns the unicode object | encodes it with the default encoding | ... >. """ + + # Do not convert unicode to str + if isinstance(obj, unicode): + return obj + try: # If this works, then _ustr(obj) has the same behaviour as str(obj), so # it won't break any existing code.equality / equivalency between grammars
I might be the first person to ever equality-test pyparsing grammars, but I need to for pyparsing_helper to work right, and it looks like ParserElement. eq wasn't written to support that (as of 1.5.1).
In [41]: Literal('a') == "a" Out[41]: True In [42]: Literal('a') == Literal('a') Out[42]: FalseI've submitted a patch, but in the meantime, here's a monkeypatch.
def _eq_monkeypatch(self, other): if isinstance(other, pyparsing.ParserElement): return self.__dict__ == other.__dict__ elif isinstance(other, basestring): try: (self + StringEnd()).parseString(_ustr(other)) return True except ParseBaseException: return False else: return super(ParserElement,self)==other pyparsing.ParserElement.__eq__ = _eq_monkeypatchpyparsing.ParserElement.eq = _eq_monkeypatch
This was fixed in pyparsing 1.5.2. -- Paul
Generating EBNF-like things from pyparsing grammars?
I'd like to generate some variant of EBNF -- it doesn't need to be too strict -- from a pyparsing grammar. Has anyone tried to do such a thing?
-- Markus
[ reply from Paul ]
Pyparsing's expressions are already self-describing in a quasi-BNF format. For example, here are some of my typical examples (a server name that could be a host name or an IP address), and how they look if printed out:
Since hostname uses different sets of characters for its initial vs. body character, it displays a two-argument format. Unfortunately, the truncation feature clips the significant difference (that the body can contain numeric digits in addition to alpha characters).
Now if we assemble these base expressions into an IP address, we see a couple of other problems:
>>> ip_addr = integer + '.' + integer + '.' + integer + '.' + integer >>> print ip_addr {{{{{{W:(0123...) "."} W:(0123...)} "."} W:(0123...)} "."} W:(0123...)}We really don't want ip_addr to resolve any deeper than its component expressions. For readability's sake, pyparsing allows you to attach a name to an expression (this is not the same as setResultsName):
>>> integer.setName("integer") integerNow if we rebuild our ip_addr expression and print out its representation, things are a little better:
>>> ip_addr = integer + '.' + integer + '.' + integer + '.' + integer >>> print ip_addr {{{{{{integer "."} integer} "."} integer} "."} integer}Hmmm, still some room for improvement. What are seeing is the intermediate form that gets created by the '+' operator, which calls ParserElement.__add__(a,b), and returns And([a,b]). Since __add__ can only see two elements at a time, an expression like "a + b + c" returns the nested And([And([a,b]),c]). This is where pyparsing has to do some reshuffling, since the user did not really add any such structure, and would just like things to be processed like And([a,b,c]). So pyparsing has an internal method named streamline() that tries to clean things up a bit. streamline() looks at expressions of like type and tries to collapse unnecessary nesting (while still preserving things like results names, grouping, etc.). If we call it, we can see the results:
>>> ip_addr.streamline() {integer "." integer "." integer "." integer}Now this is a lot cleaner! (User code rarely needs to call streamline, it is automatically called as part of the logic in parseString.)
But this is only helpful for showing the top-level expression. If we want to drill down into the parser, we'll need to peel away the names we gave the sub-expressions. See how this is done in the attached little script:
from pyparsing import * integer = Word(nums).setName("integer") ip_addr = integer + '.' + integer + '.' + integer + '.' + integer hostname = Word(alphas, alphanums+'_').setName("hostname") hostref = hostname | ip_addr # internal pyparsing method, rarely called in user code hostref.streamline() for exprname in "hostref hostname integer".split(): expr = locals()[exprname] e = expr.copy() if hasattr(e,"name"): del e.name print exprname,'::',ePrints:
hostref :: {hostname | {integer "." integer "." integer "." integer}} hostname :: W:(abcd...,abcd...) integer :: W:(0123...)This isn't a complete solution, but maybe it will give you some ideas on how to approach your problem.
-- Paul
[Markus again]
Thanks, Paul. I should really learn to control my coding habit, since of course I got impatient while offline and now coded something that could have made really good use of streamline(). Anyway, there are quite a few subtleties I'd probably have encountered even with streamline. If someone needs something like this: http://www.tfiu.de/homepage/hacks/#pyparsingToEBNF (warning: much more verbose than Paul's suggestions)-- and I'll gladly expand it if someone actually uses it.
[Ben Liles]
Would it be possible to remove the download url from the pypi record so that easy_install will download the tar.gz uploaded to the pypi? That way it won't have to read from wikispaces. I'm using buildout and cannot specify the full url to get it from.
Try it now. - Paul
Error installing
I see in the README that python 2.3.2 or later is required. I am running 2.3.4 on RedHat, and got this error when I tried to install:[root@host pyparsing-1.5.0]# python setup.py install Traceback (most recent call last): File "setup.py", line 6, in ? from pyparsing import __version__ File "/var/tmp/pyparsing-1.5.0/pyparsing.py", line 2506 matchOrder += list(e for e in self.exprs if isinstance(e,Optional) and e.expr in tmpOpt) ^ SyntaxError: invalid syntaxShould I upgrade?[Mark]
Another syntax error raised during install (pyparsing_py3.py, line 2470)
This is my sys.version:2.5.1 (r251:54863, Feb 6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)]
The line with the raised syntax error:
The log:
$ python setup.py install running install running build running build_py creating build creating build/lib copying pyparsing.py -> build/lib copying pyparsing_py3.py -> build/lib running install_lib copying build/lib/pyparsing.py -> /Library/Python/2.5/site-packages copying build/lib/pyparsing_py3.py -> /Library/Python/2.5/site-packages byte-compiling /Library/Python/2.5/site-packages/pyparsing_py3.py to pyparsing_py3.pyc File "/Library/Python/2.5/site-packages/pyparsing_py3.py", line 2470 except ParseException as err: ^ SyntaxError: invalid syntax running install_egg_info Writing /Library/Python/2.5/site-packages/pyparsing-1.5.2-py2.5.egg-infoAnyway, the greeting.py example does work.
[2009-10-06]
2010/05/15: Same problem on CygWin with Python 2.5. Replace "as" with ",". Or just ignore the error because that module is intended for v3 only.
know this is random, but can we have a better page were you can comment or bring ideas that the owners can have a look at? or can i email the owners about a new idea?
Post it to the Discussion tab on the Pyparsing WIki home page. (Anyone can post discussion comments)
alphas is locale-dependent
The documentation claims that "alphas" is 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghikjlmnopqrstuvwxyz'. But in reality this is not the case! It constructs it out of alphas.uppercase and alphas.lowercase, which is locale-dependent -- and on my system is full of accented characters! It obviously doesn't make sense to have a programming language whose legal identifiers vary from system to system, so why not just use the literal string 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghikjlmnopqrstuvwxyz'? - Kef Schecter